cient Learning of Semi - structured Data from Queries
نویسندگان
چکیده
This paper studies the polynomial-time learnability of the classes of ordered gapped tree patterns (OGT) and ordered gapped forests (OGF) under the into-matching semantics in the query learning model of Angluin. The class OGT is a model of semi-structured database query languages, and a generalization of both the class of ordered/unordered tree pattern languages and the class of non-erasing regular pattern languages. First, we present a polynomial time learning algorithm for OGT, the subclass of OGT without repeated tree variables, using equivalence queries and membership queries. By extending this algorithm, we present polynomial time learning algorithms for the classes -OGF of forests without repeated variables and OGT of trees with repeated variables using equivalence queries and subset queries. We also give representation-independent hardness results which indicate that both of equivalence and membership queries are necessary to learn -OGT.
منابع مشابه
Efficient Learning of Semi-structured Data from Queries
This paper studies the learning complexity of classes of structured patterns for HTML/ XML-trees in the query learning framework of Angluin. We present polynomial time learning algorithms for ordered gapped tree patterns, OGT, and ordered gapped forests, OGF, under the into-matching semantics using equivalence queries and subset queries. As a corollary, the learnability with equivalence and mem...
متن کاملEnhancing passage retrieval in log files by query expansion based on explicit and pseudo relevance feedback
Passage retrieval is usually defined as the task of searching for passages which may contain the answer for a given query. While these approaches are very e cient when dealing with texts, applied to log files (i.e. semi-structured data containing both numerical and symbolic information) they usually provide irrelevant or useless results. Nevertheless one appealing way for improving the results ...
متن کاملNormalization and Learning of Transducers on Trees and Words. (Normalisation et Apprentissage de Transducteurs d'Arbres et de Mots)
Since the arrival of the Web, various kinds of semi-structured data formats were introduced in the areas of computer science and technology relevant for the Web, such as document processing, database management, knowledge representation, and information exchange. The most recent technologies for processing semi-structured data rely on the formats Json and Rdf. The main questions there are how t...
متن کاملGeoreferencing Semi-Structured Place-Based Web Resources Using Machine Learning
In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georefer...
متن کاملEvaluation of Top-k Queries over Structured and Semi-structured Data
Evaluation of Top-k Queries over Structured and Semi-structured Data
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007